Comparison of Approaches for Instrumentally Predicting the Quality of Text-to-Speech Systems: Data from Blizzard Challenges

نویسندگان

  • Florian Hinterleitner
  • Sebastian Möller
  • Tiago H. Falk
  • Tim Polzehl
چکیده

In this paper, we compare and combine different approaches for instrumentally predicting the perceived quality of Text-toSpeech systems. First, a Log-Likelihood is determined by comparing features extracted from synthesized speech signals with features trained on natural speech. Second, parameters are extracted which capture quality-relevant degradations of the synthesized speech signal. Both approaches are combined and evaluated on auditory evaluated synthetic speech databases from the Blizzard Challenges 2008 and 2009. The results show that auditory quality judgments can be predicted with a sufficiently high accuracy and reliability. Especially the possibility to rank different synthesizer systems by their quality comes within reach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

Comparison of approaches for instrumentally predicting the quality of text-to-speech systems

In this paper, we compare and combine different approaches for instrumentally predicting the perceived quality of Text-to-Speech systems. First, a log-likelihood is determined by comparing features extracted from the synthesized speech signal with features trained on natural speech. Second, parameters are extracted which capture quality-relevant degradations of the synthesized speech signal. Bo...

متن کامل

DNN-based Speech Synthesis for Indian Languages from ASCII text

Text-to-Speech synthesis in Indian languages has a seen lot of progress over the decade partly due to the annual Blizzard challenges. These systems assume the text to be written in Devanagari or Dravidian scripts which are nearly phonemic orthography scripts. However, the most common form of computer interaction among Indians is ASCII written transliterated text. Such text is generally noisy wi...

متن کامل

Towards Perceptual Quality Modeling of Synthesized Audiobooks – Blizzard Challenge

This paper reports on recent advances in the field of instrumental quality evaluation of text-to-speech (TTS) synthesis. In particular, a wide range of acoustic quality markers are analyzed concerning their quality-describing power using the audiobook data from the Blizzard Challenge 2012. Several approaches for perceptual modeling are investigated and compared with each other. The results reve...

متن کامل

Speech Database Speech Analysis Training of MSD - HSMM Excitation parameters Spectral parameters Speech signal Context - dependent MSD - HSMMs and duration models Speech Parameter Generation

This paper describes the text-to-speech synthesis system developed for the Blizzard Challenge 2016 by members of the ADAPT centre and colleagues from associated projects. The task was to build a synthetic voice for reading audiobooks to children, from a speech database of audiobooks around 5 hours long. Our entry system is an HMM-based parametric speech synthesizer which was built using a subse...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010